Skip to content

feat: cdp keep alive and exit strategy#365

Merged
shadowfax92 merged 4 commits intomainfrom
feat/fix-cdp-die
Feb 26, 2026
Merged

feat: cdp keep alive and exit strategy#365
shadowfax92 merged 4 commits intomainfrom
feat/fix-cdp-die

Conversation

@shadowfax92
Copy link
Contributor

No description provided.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 26, 2026

Greptile Summary

This PR adds two resilience features to the CDP connection layer: a keepalive mechanism that detects zombie WebSocket connections (by periodically sending Browser.getVersion and calling handleDeadConnection on timeout), and an exit strategy that calls process.exit when reconnection is exhausted or when no health check arrives within 5 minutes. Per-request timeouts are also introduced to prevent rawSend calls from hanging indefinitely.

Key changes:

  • cdp.ts: Adds startKeepalive/stopKeepalive, handleDeadConnection, per-request timers on PendingRequest, stale-socket guard in attemptConnect, and renames reconnectOrCrashreconnectWithRetries with disconnecting early-exit checks.
  • health.ts: Adds a watchdog timer that exits the process if no /health ping is received for 5 minutes; response now includes cdpConnected status.
  • timeouts.ts / limits.ts: New shared constants for reconnect delay, keepalive interval/timeout, request timeout, and reconnect retry count.
  • browser.ts: Exposes isCdpConnected() delegating to cdp.isConnected().

Issues found:

  • HEALTH_CHECK_TIMEOUT in health.ts is a local magic constant; per CLAUDE.md all timeout values must live in @browseros/shared/constants/timeouts. All other new timeouts in this PR were correctly placed there.
  • The watchdog timer returned by createHealthRoute has no cleanup path. If a graceful shutdown takes longer than 5 minutes after health checks stop, the watchdog fires process.exit(GENERAL_ERROR) instead of allowing a clean exit.

Confidence Score: 4/5

  • Safe to merge with minor follow-up; the keepalive and exit-strategy logic is correct and well-structured.
  • The core reconnection and keepalive logic is sound: the stale-socket guard prevents double close handling, per-request timers prevent hung promises, and rejectPendingRequests properly clears all timers. The only concrete issues are a CLAUDE.md violation (magic constant HEALTH_CHECK_TIMEOUT should be in shared TIMEOUTS) and a missing watchdog cleanup on graceful shutdown — neither blocks correctness of the happy path.
  • apps/server/src/api/routes/health.ts — watchdog timer has no cleanup mechanism and contains a magic constant that violates CLAUDE.md.

Important Files Changed

Filename Overview
apps/server/src/browser/backends/cdp.ts Core keepalive and reconnection logic added with per-request timeouts, dead-connection detection, and proper timer cleanup. Logic is sound; stale-socket guard prevents double-invocation of handleUnexpectedClose. A subtle design gap exists: if the freshly reconnected socket closes between the successful return of reconnectWithRetries and its .finally callback (resetting reconnecting = false), the close is silently swallowed — but this is a pre-existing race acknowledged in the code comments.
apps/server/src/api/routes/health.ts Adds a watchdog timer that exits the process after 5 minutes without a health check. HEALTH_CHECK_TIMEOUT is a magic constant that should live in shared TIMEOUTS per CLAUDE.md. Additionally, the watchdog timer has no cleanup path — if the server shuts down gracefully and the health poller stops, the watchdog could fire process.exit(GENERAL_ERROR) rather than letting the process exit cleanly.
apps/server/src/browser/browser.ts Small, clean addition of isCdpConnected() that delegates to cdp.isConnected(). No issues.
apps/server/src/api/server.ts Trivial one-line change to thread the browser instance into createHealthRoute. No issues.
packages/shared/src/constants/limits.ts Adds RECONNECT_MAX_RETRIES alongside existing CONNECT_MAX_RETRIES, both set to 3. Keeping them separate allows independent tuning. No issues.
packages/shared/src/constants/timeouts.ts Adds CDP_RECONNECT_DELAY, CDP_KEEPALIVE_INTERVAL, CDP_KEEPALIVE_TIMEOUT, CDP_REQUEST_TIMEOUT — all well-named and correctly placed. No issues.

Sequence Diagram

sequenceDiagram
    participant HC as Health Poller
    participant HR as HealthRoute (watchdog)
    participant B as Browser
    participant CDP as CdpBackend
    participant CR as Chromium

    Note over HR: watchdog starts on creation (5 min timer)

    loop every N seconds
        HC->>HR: GET /health
        HR->>HR: resetWatchdog()
        HR->>B: isCdpConnected()
        B->>CDP: isConnected()
        CDP-->>B: bool
        B-->>HR: bool
        HR-->>HC: {status:"ok", cdpConnected:bool}
    end

    Note over CDP: keepalive loop every 30s
    CDP->>CR: Browser.getVersion (10s race timeout)
    alt success
        CR-->>CDP: version response
        CDP->>CDP: clearTimeout(pendingTimer)
    else timeout / send failure
        CDP->>CDP: handleDeadConnection()
        CDP->>CDP: stopKeepalive()
        CDP->>CDP: ws.close() → ws=null
        CDP->>CDP: handleUnexpectedClose()
        CDP->>CDP: rejectPendingRequests()
        CDP->>CDP: reconnectWithRetries() [up to 3×, 5s delay each]
        alt reconnect succeeds
            CDP->>CR: connect WebSocket
            CR-->>CDP: open
            CDP->>CDP: startKeepalive()
        else all retries exhausted
            CDP->>CDP: process.exit(GENERAL_ERROR)
        end
    end

    alt no health check for 5 min
        HR->>HR: watchdog fires
        HR->>HR: process.exit(GENERAL_ERROR)
    end
Loading

Last reviewed commit: 193151d

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 26, 2026

Additional Comments (1)

apps/server/src/browser/backends/cdp.ts
resetting disconnecting to false here creates a race condition - if an old socket's onclose handler fires after a new connection establishes, it will set this.connected = false and this.ws = null, breaking the new connection

capture the socket reference in the onclose closure and check if (this.ws !== currentWs) return before modifying shared state

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/server/src/browser/backends/cdp.ts
Line: 81

Comment:
resetting `disconnecting` to false here creates a race condition - if an old socket's `onclose` handler fires after a new connection establishes, it will set `this.connected = false` and `this.ws = null`, breaking the new connection

capture the socket reference in the `onclose` closure and check `if (this.ws !== currentWs) return` before modifying shared state

How can I resolve this? If you propose a fix, please make it concise.

@shadowfax92 shadowfax92 merged commit d7bb80e into main Feb 26, 2026
4 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant